git status - is to check is my local repo is updated with the master repo git fetch then git pull - to pull the files from master repo to local repo git add . - to place file to the staging message git commit -m “Commit message” - to commit a file for addtion and to include a message along with it. git push - to push the file to the master repo
The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:
1231521+1234155628098
## [1] 1.234157e+12
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.
library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!
#Load Library
library("tidyverse")
## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.3.0
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
#Exercise 1
metadata = read.table(file="Saanich.metadata.txt", header=TRUE, row.names = 1, sep="\t", na.strings="NAN")
OTU = read.table(file="Saanich.OTU.txt", header=TRUE, row.names = 1, sep="\t", na.strings="NAN")
#Exercise 2
metadata %>% rownames_to_column('sample') %>%
filter(CH4_nM >=100, Temperature_C<=10) %>%
column_to_rownames('sample') %>%
select(Depth_m,CH4_nM,Temperature_C)
#Exercise 3
nM_to_uM_Metadata_coversion <-metadata %>% rownames_to_column('sample') %>%
select(matches("nM"), matches('sample')) %>%
mutate(N2O_uM = N2O_nM/1000, Std_N2O_uM = Std_N2O_nM/1000, CH4_uM = CH4_nM/1000, Std_CH4_uM = Std_CH4_nM/1000) %>%
column_to_rownames('sample')
#For Exercise 3: All variables that are in nM to μM. The output table titled: nM_to_uM_Metadata_coversion shows only the original nM and converted μμM variables.
library("tidyverse")
source("https://bioconductor.org/biocLite.R")
## Bioconductor version 3.6 (BiocInstaller 1.28.0), ?biocLite for help
biocLite("phyloseq")
## BioC_mirror: https://bioconductor.org
## Using Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.3 (2017-11-30).
## Installing package(s) 'phyloseq'
##
## The downloaded binary packages are in
## /var/folders/z3/h3tm9hss3bx6zbhrqjq0f5y80000gn/T//RtmpfPClb3/downloaded_packages
## Old packages: 'ade4', 'ape', 'bindr', 'bindrcpp', 'broom', 'callr',
## 'cluster', 'curl', 'foreign', 'glmnet', 'igraph', 'kableExtra',
## 'lubridate', 'Matrix', 'nlme', 'plogr', 'psych', 'Rcpp', 'readxl',
## 'selectr', 'stringi', 'survival', 'tinytex', 'vegan', 'withr'
library("phyloseq")
load("phyloseq_object.RData")
#Exercise 1
ggplot(metadata, aes(x=PO4_uM, y=Depth_m)) +
geom_point(color="purple", shape=17)
#Exercise 2
metadata %>%
mutate(Temperature_F= Temperature_C*9/5+32) %>%
ggplot() + geom_point(aes(x=Temperature_F, y=Depth_m))
#gglot with phyloseq
plot_bar(physeq, fill="Phylum")
physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))
plot_bar(physeq_percent, fill="Phylum")
plot_bar(physeq_percent, fill="Phylum") +
geom_bar(aes(fill=Phylum), stat="identity")
#Exercise 3
plot_bar(physeq_percent, fill="Phylum", title = "Phyla from 10 to 200 in Saanich Inlet") +
geom_bar(aes(fill=Phylum), stat="identity") +
labs(x="Sample depth", y="Percent relative abundance")
#Faceting
plot_bar(physeq_percent, fill="Phylum") +
geom_bar(aes(fill=Phylum), stat="identity") +
facet_wrap(~Phylum)
plot_bar(physeq_percent, fill="Phylum") +
geom_bar(aes(fill=Phylum), stat="identity") +
facet_wrap(~Phylum, scales="free_y") +
theme(legend.position="none")
#Exercise 4
plot_nutrients= metadata %>%
select(Depth_m, NH4_uM,NO2_uM, NO3_uM, O2_uM, PO4_uM, SiO2_uM) %>%
gather(Nutrients, Concentration, NH4_uM,NO2_uM, NO3_uM, O2_uM, PO4_uM, SiO2_uM)
ggplot(plot_nutrients, aes(x=Depth_m, y=Concentration)) +
geom_point() + geom_line() +facet_wrap(~Nutrients, scales="free_y") +
theme(legend.position = "none")
Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
What were the main questions being asked?
What were the primary methodological approaches used?
For the aquatic environments, the volumes of oceanic water, freshwater/ saline lakes, polar regions and the corresponding average cellular densities were multiplied to calculate the number of cells for that region. For the polar region, in particular, the estimated number of prokaryotes by Delille & Rosiers and the mean area extent of seasonal ice were also used in the calculation.For the soil, the authors conducted detailed direct counts from a coniferous forest utisol as it was generally considered representative of forest soil. For the subsurface, the first approach is based on the assumption of the percentage of the average porosity of the terrestrial subsurface (3%) and the total pore space occupied by prokaryotes (0.016%). The other approach involved using the estimated of number of prokaryotes in various groundwater sites multiplied to total volume of ground water in the earth surface.Summarize the main results or findings.
Do new questions arise from the results?
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
Kasting, JF, Siefert, JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296:1066-1068. doi: 10.1126/science.1071184. Link
Comment on the emergence of microbial life and the evolution of Earth systems
Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.
1.3 billion years ago
Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:
The oxygen also started oxidizing iron forming banded irons, as seen in sedimentary rock.
Once again, glaciation occured at various periods
Waters, CN, Zalasiewicz, J, Summerhayes, C, Barnosky, AD, Poirier, C, Galuszka, A, Cearreta, A, Edgeworth, M, Ellis, EC, Ellis, M, Jeandel, C, Leinfelder, R, McNeill, JR, Richter, DD, Steffen, W, Syvitski, J, Vidas, D, Wagreich, M, Williams, M, An, ZS, Grinevald, J, Odada, E, Oreskes, N, Wolfe, AP. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science. 351:137. doi: 10.1126/science.aad2622. Link
What were the main questions being asked?
What were the primary methodological approaches used?
Summarize the main results or findings.
Do new questions arise from the results?
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.
What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?
What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?
Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?
Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?
Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?
How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
population size * # of turn over per year (years) = cell per year
Example: Using the data for marine heterotrophs:3.6 x 10^{28} * 365 day /15 turnovers = 8.2 x 10^{29} cells/ year
What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?
With 20% loss and 80% approx. retained, there is 4 x 0.72 = 2.88 petagrams of C/year for marine heterotrophs
43 petagrams C consumed /year / 2.88 petagrams C assimilated/year = 14.9 or 1 turnover every 24.5 days
The variation of carbon assimilation with depths are primarily due to the different carbon production and composition of microoganism found in that particular habitat.
How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)
365 days/ 16 days = 22.8 turnovers/year
3.6 x 10^{28} cells x 22.8 turnovers/year = 8.2 x 10{^29} cells/year
8.2 x 10{^29} cells/year x 2.56 x 10^{-26} mutations/generation = 2.1x 10{^4} mutations/year
2.1 x 104 mutations/year is about 0.4 mutations/hour, as stated in the paper (1).
With a fast turnover rate and a big population size, these numbers are possible with respect to microbial population
Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?
What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?
Falkowski, PG, Fenchel, T, Delong, EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320:1034-1039. doi: 10.1126/science.1153213. Link
Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.
What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?
The primary geophysical processes are the tectonics and atmospheric photochemical processes (i.e. erosion and geothermal activity) which continuously supply substrates and remove products on earth. These processes allow interactions of elements and molecules and cycles of chemical bond formation and cleavage that make planetary chemistry ultimately at thermodynamic equilibrium.
The primary biogeochemical processes involves the cycle of the six major elements—H, C, N, O, S, and P. The first 5 elements in the list is hugely driven by microbes through thermodynamically constrained redox reactions. In addition, volcanism and rock weathering contribute to the nutrient cycling on earth. These events resupply C, S, and P.
Abiotic processes are based on acid/base chemistry (i.e., transfers of protons without electrons) while biotic processes are based by redox reactions (i.e., successive transfers of electrons and protons from a relatively limited set of chemical elements). These processes are interconnected in which they have laid down the lower limits on the external energy needed to sustain the biogeochemical cycles on earth.
Why is Earth’s redox state considered an emergent property?
How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?
We can look at the nitrogen cycle at different points but for the purpose of explaining a cycle, we can start at the nitrogen fixation step as the starting reference point. Nitrogen fixation converts N2 to NH4 which allows for N2 to become accessible for synthesis of proteins and nucleic acids in organisms.
In the presence of O2, NH4 is first oxidized to nitrite (NO2) by a specific group of bacteria or archaea, which then is oxidized to nitrate (NO3) by a different set of nitrifying bacteria. The redox potential involved in oxidation is used by nitrifiers to reduce CO2 into organic matter.
In the absence of O2, a differnt group of microbes may use NO2 and NO3 as electron acceptors in anaerobic oxidation to evetually produce N2. This closes the N-cycle.
Quoted from the text, the N-cycle “forms an interdependent electron pool that is influenced by photosynthetic production of oxygen and the availability of organic matter”. Sunlight availability is affected by climated change. This, in turn, affect the N-cycle whereby photosynthetic organisms that require nitrogen oxides as terminal electron acceptors are involved. On the other hand, N-cycle can also haven a postive impact to climate change. Presence of nitrifying organisms, NH4 or NO2 can be use to reduce CO2 into organic matter, which then may minimize the green house effect by decreasing CO2 levels.
What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?
Different redox and oxidations reations that give rise to a certain element or molecule are partitioned in different microbial groups as part of their metabolic pathway. As an example, methanogenic Archaea uses carbon dioxide and hydrogen to form methane. This is unique to hydrogen-consuming sulfate reducers which uses hydrogen. With their differert metabolic functions, each microbial groups can have a specialized role in a given community. With more microbial groupings specizlized in different methabolic pathways, we would expect a higher microbial diversity.
Microbes are able to transfer genes to other microbes via horizontal gene transfer. The transferred genes are retained due to presence of selective pressures in the environment. The transferred genes may encode a part of or an entire methabolic pathway. Following the central dogma of biology, with new transferred genes, new proteins are expected to be translated.
With some parts of the metabolic pathway distributed to some microbial groups, there would be an increase in microbial diversity within a given environment. Under specific conditions, different microbes will need to transcibe and translate particular genes and produce proteins required for its survival. Thus, this relate to discovery of new protein families from microbial community genomes, whereby these environment-specific genes in a given microorganism are turned by a particular habitat.
On what basis do the authors consider microbes the guardians of metabolism?
Achenbach, J. 2012. Spaceship Earth: A new view of environmentalism. Washington Post. Link.
Budny, JA. 2017. Book Review: Aerobiology—The Toxicology of Airborne Pathogens and Toxins. International Journal of Toxicology. 36:50-51. doi: 10.1177/1091581816678191. Link
Canfield, DE, Glazer, AN, Falkowski, PG. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science. 330:192-196. doi: 10.1126/science.1186120. 20929768
Falkowski, PG, Fenchel, T, Delong, EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320:1034-1039. doi: 10.1126/science.1153213. Link
Kasting, JF, Siefert, JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296:1066-1068. doi: 10.1126/science.1071184. Link
Schrag, DP. 2012. Geobiology of the Anthropocene. Fundamentals of Geobiology. 425-436. Link
Nisbet, EG, Sleep, NH. 2001. The habitat and nature of early life. Nature. 409:1083-1091. doi: 10.1038/35059210. Link
Rockström, J, Steffen, W, Noone, K, Scheffer, M, Teknik- och vetenskapshistoria (bytt namn, 20120201), Skolan för arkitektur och samhällsbyggnad, (ABE), KTH, Filosofi och teknikhistoria. 2009. A safe operating space for humanity. Nature. 461:472-475. doi: 10.1038/461472a. Link
Waters, CN, Zalasiewicz, J, Summerhayes, C, Barnosky, AD, Poirier, C, Galuszka, A, Cearreta, A, Edgeworth, M, Ellis, EC, Ellis, M, Jeandel, C, Leinfelder, R, McNeill, JR, Richter, DD, Steffen, W, Syvitski, J, Vidas, D, Wagreich, M, Williams, M, An, ZS, Grinevald, J, Odada, E, Oreskes, N, Wolfe, AP. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science. 351:137. doi: 10.1126/science.aad2622. Link
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863
Martinez, A, Bradley, AS, Waldbauer, JR, Summons, RE, DeLong, EF. 2007. Proteorhodopsin Photosystem Gene Expression Enables Photophosphorylation in a Heterologous Host. Proc. Natl. Acad. Sci. U. S. A. 104:5590-5595. doi: 10.1073/pnas.0611470104.
What were the main questions being asked?
What were the primary methodological approaches used?
Summarize the main results or findings.
Do new questions arise from the results?
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
Wooley, JC, Godzik, A, Friedberg, I. 2010. A primer on metagenomics. PLoS Computational Biology. 6:e1000667. doi: 10.1371/journal.pcbi.1000667.
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)
How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?
Point is most of the life is uncultured. Only information we have about life is from seqeuncing.
Solden, L, Lloyd, K, Wrighton, K. 2016. The bright side of microbial dark matter: lessons learned from the uncultivated majority. Curr. Opin. Microbiol. 31:217-226. doi: 10.1016/j.mib.2016.04.020.
Youssef, NH, Couger, MB, McCully, AL, Criado, AEG, Elshahed, MS. 2015. Assessing the global phylum level diversity within the bacterial domain: A review. Journal of Advanced Research. 6:269-282. doi: 10.1016/j.jare.2014.10.005.
Rappé, MS, Giovannoni, SJ. 2003. THE UNCULTURED MICROBIAL MAJORITY. Annual Reviews in Microbiology. 57:369-394. doi: 10.1146/annurev.micro.57.030502.090759.
What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?
What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?
Binning: process of grouping sequences that comes from a single genome
Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?
Madsen, EL. 2005. Opinion: Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology. 3:439-446. doi: 10.1038/nrmicro1151. Link
Martinez, A, Bradley, AS, Waldbauer, JR, Summons, RE, DeLong, EF. 2007. Proteorhodopsin Photosystem Gene Expression Enables Photophosphorylation in a Heterologous Host. Proc. Natl. Acad. Sci. U. S. A. 104:5590-5595. doi: 10.1073/pnas.0611470104. 17372221
Rappé, MS, Giovannoni, SJ. 2003. The uncultured microbial majority. Annual Reviews in Microbiology. 57:369-394. doi: 10.1146/annurev.micro.57.030502.090759. 14527284
Solden, L, Lloyd, K, Wrighton, K. 2016. The bright side of microbial dark matter: lessons learned from the uncultivated majority. Curr. Opin. Microbiol. 31:217-226. doi: 10.1016/j.mib.2016.04.020. 27196505
Wooley, JC, Godzik, A, Friedberg, I. 2010. A primer on metagenomics. PLoS Computational Biology. 6:e1000667. doi: 10.1371/journal.pcbi.1000667. 20195499
Youssef, NH, Couger, MB, McCully, AL, Criado, AEG, Elshahed, MS. 2015. Assessing the global phylum level diversity within the bacterial domain: A review. Journal of Advanced Research. 6:269-282. doi: 10.1016/j.jare.2014.10.005. 26257925
Welch, RA, Burland, V, Plunkett, G, Redford, P, Roesch, P, Rasko, D, Buckles, EL, Liou, S-, Boutin, A, Hackett, J, Stroud, D, Mayhew, GF, Rose, DJ, Zhou, S, Schwartz, DC, Perna, NT, H. L. T. Mobley, Donnenberg, MS, Blattner, FR. 2002. Extensive Mosaic Structure Revealed by the Complete Genome Sequence of Uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 99:17020-17024. doi: 10.1073/pnas.252529799.
Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
Explain the relationship between microdiversity, genomic diversity and metabolic potential
Comment on the forces mediating divergence and cohesion in natural microbial communities
What were the main questions being asked?
How does the genomes of uropathogenic Escherichia coli, strain CFT073, enterohemorrhagic E. coli EDL933, and laboratory strain MG1655 compare to each other? What makes each them distinct from one another?
How are these difference relate to their phenotype?
What were the primary methodological approaches used?
Summarize the main results or findings.
The genome of uropathogenic Escherichia coli, strain CFT073 is circular and has 5,231,428-bp. No virulence plasmids were found CFT073 since it not usually associated with uropathogenic strains
This suggested that there are difference in the genomes between pathogenic strains. This holds true when also compared to the benign strain.
As quoted from the text, different E. coli pathotypes have maintained a remarkable synteny of common, vertically evolved genes, whereas many islands interrupting this common backbone have been acquired by different horizontal transfer events in each strain.
With only 39.2% between the genomes of Escherichia coli strains CFT073, EDL933, strain MG1655, what can be conluded from the study is that a species may have a low percentage genome similarity and be functional different.
Do new questions arise from the results?
Should we define species now based on its pathogenicity? What would be the cut off? +Should we take a closer consideration of the niche th eeach pathotypes survived when it comes to defining what a species it?
Why aren’t virulence plasmid associated with uropathogenic strain CFT073 even though they are common to many E. coli isolates and usually associated with other uropathogenic strains?
Can we access the presence of black holes given the large number of genetic differences with today’s technology?
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.
The figure is a comparison between the locations and sizes CFT073 and EDL933 island. The Island size is on the vertical axis; position in colinear backbone is on the horizontal axis.
An ecotype describes a genetically distinct entity within a species which is genotypically adapted to specific environmental conditions. These different ecotypes or different strains of E. coli exhibit phenotypic differences. In the context
For uropathogenic strains of E. coli, island acquisition resulted in the capability to infect the urinary tract and bloodstream and evade host defenses without compromising the ability to harmlessly colonize the intestine.
For the different intestinal pathogens, acquired genes promote the colonization of specific regions of the intestine and new modes of interaction with the host tissue that produce clinically distinct variations of gastrointestinal disease
What makes them the same species or what makes them all E. coli are due to vertical transfer of ancestral backbone genes. The new genes found in different E coli strain are acquired via numerous, independent horizontal gene-transfer.
Kunin, V, Engelbrektson, A, Ochman, H, Hugenholtz, P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ. Microbiol. 12:118-123. doi: 10.1111/j.1462-2920.2009.02051.x.
Sogin, ML, Morrison, HG, Huber, JA, Welch, DM, Huse, SM, Neal, PR, Arrieta, JM, Herndl, GJ. 2006. Microbial Diversity in the Deep Sea and the Underexplored “Rare Biosphere”. Proc. Natl. Acad. Sci. U. S. A. 103:12115-12120. doi: 10.1073/pnas.0605127103.
In class Day 1:
Assignment:
In class Day 2:
Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.
Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.
Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.
For example, load in the packages you will use.
#To make tables
library(knitr)
#To manipulate and plot data
library(tidyverse)
library(kableExtra)
Then load in the data. You should use a similar format to record your community data.
example_data1 = data.frame(
number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
name = c("M_n_Ms", "Kisses", "Skittles","Spheres", "Jolly_Ranchers", "Wine_gummies", "Octupi_gummies","Swirl_gummies", "Cherry_gummies", "Watermelon_gummies","Cola_gummies", "Classic_bear_gummies", "Sugar_coated_bear_gummies", "String", "Lego"),
characteristics = c("chocolate inside; 6 different colours", "chocolate inside; silver", "sugar inside ; 5 different colour","sugar inside; 3 different colours", "sugar inside; 5 different colour; elongated", "gummy; 2 different colours; matte", "gummy; pink and yellow; sugar coated; 7 legs","gummy; 2 different colours; sugar coated", "gummy; cherry-shaped; sugar coated", "gummy; watermelon-shaped; sugar coated","gummy; soda-shaped; sugar coated", "gummy; bear-shaped", "Sugar-gummy; bear-shaped; sugar coated", "gummy; red; string", "sugar inside; 3 different colour; lego-shaped, hard"),
occurences = c(52,1,39,6,38,2,0,0,1,0,1,23,1,1,5)
)
example_data2 = data.frame(
number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
name = c("M_n_Ms", "Kisses", "Skittles","Spheres", "Jolly_Ranchers", "Wine_gummies", "Octupi_gummies","Swirl_gummies", "Cherry_gummies", "Watermelon_gummies","Cola_gummies", "Classic_bear_gummies", "Sugar_coated_bear_gummies", "String", "Lego"),
characteristics = c("chocolate inside; 6 different colours", "chocolate inside; silver", "sugar inside ; 5 different colour","sugar inside; 3 different colours", "sugar inside; 5 different colour; elongated", "gummy; 2 different colours; matte", "gummy; pink and yellow; sugar coated; 7 legs","gummy; 2 different colours; sugar coated", "gummy; cherry-shaped; sugar coated", "gummy; watermelon-shaped; sugar coated","gummy; soda-shaped; sugar coated", "gummy; bear-shaped", "Sugar-gummy; bear-shaped; sugar coated", "gummy; red; string", "sugar inside; 3 different colour; lego-shaped, hard"),
occurences = c(214,16,197,19,131,6,6,3,1,1,3,101,3,14,17)
)
Finally, use these data to create a table.
example_data1 %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| number | name | characteristics | occurences |
|---|---|---|---|
| 1 | M_n_Ms | chocolate inside; 6 different colours | 52 |
| 2 | Kisses | chocolate inside; silver | 1 |
| 3 | Skittles | sugar inside ; 5 different colour | 39 |
| 4 | Spheres | sugar inside; 3 different colours | 6 |
| 5 | Jolly_Ranchers | sugar inside; 5 different colour; elongated | 38 |
| 6 | Wine_gummies | gummy; 2 different colours; matte | 2 |
| 7 | Octupi_gummies | gummy; pink and yellow; sugar coated; 7 legs | 0 |
| 8 | Swirl_gummies | gummy; 2 different colours; sugar coated | 0 |
| 9 | Cherry_gummies | gummy; cherry-shaped; sugar coated | 1 |
| 10 | Watermelon_gummies | gummy; watermelon-shaped; sugar coated | 0 |
| 11 | Cola_gummies | gummy; soda-shaped; sugar coated | 1 |
| 12 | Classic_bear_gummies | gummy; bear-shaped | 23 |
| 13 | Sugar_coated_bear_gummies | Sugar-gummy; bear-shaped; sugar coated | 1 |
| 14 | String | gummy; red; string | 1 |
| 15 | Lego | sugar inside; 3 different colour; lego-shaped, hard | 5 |
example_data2 %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| number | name | characteristics | occurences |
|---|---|---|---|
| 1 | M_n_Ms | chocolate inside; 6 different colours | 214 |
| 2 | Kisses | chocolate inside; silver | 16 |
| 3 | Skittles | sugar inside ; 5 different colour | 197 |
| 4 | Spheres | sugar inside; 3 different colours | 19 |
| 5 | Jolly_Ranchers | sugar inside; 5 different colour; elongated | 131 |
| 6 | Wine_gummies | gummy; 2 different colours; matte | 6 |
| 7 | Octupi_gummies | gummy; pink and yellow; sugar coated; 7 legs | 6 |
| 8 | Swirl_gummies | gummy; 2 different colours; sugar coated | 3 |
| 9 | Cherry_gummies | gummy; cherry-shaped; sugar coated | 1 |
| 10 | Watermelon_gummies | gummy; watermelon-shaped; sugar coated | 1 |
| 11 | Cola_gummies | gummy; soda-shaped; sugar coated | 3 |
| 12 | Classic_bear_gummies | gummy; bear-shaped | 101 |
| 13 | Sugar_coated_bear_gummies | Sugar-gummy; bear-shaped; sugar coated | 3 |
| 14 | String | gummy; red; string | 14 |
| 15 | Lego | sugar inside; 3 different colour; lego-shaped, hard | 17 |
For your community:
To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.
To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.
For example, we load in these data.
example_data3 = data.frame(
x = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170),
y = c(1,1,1,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,7,8,8,8,8,8,8,8,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12)
)
And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.
ggplot(example_data3, aes(x=x, y=y)) +
geom_point() +
geom_smooth() +
labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'
For your sample:
Using the table from Part 1, calculate species diversity using the following indices or metrics.
\(\frac{1}{D}\) where \(D = \sum p_i^2\)
\(p_i\) = the fractional abundance of the \(i^{th}\) species
For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =
M_n_Ms = 52 /(170)
Kisses = 1 /(170)
Skittles = 39 /(170)
Spheres = 6 /(170)
Jolly_Ranchers = 38 /(170)
Wine_gummies = 2 /(170)
Octupi_gummies = 0 /(170)
Swirl_gummies = 0 /(170)
Cherry_gummies = 1 /(70)
Watermelon_gummies = 0 /(170)
Cola_gummies = 1 /(170)
Classic_bear_gummies = 23 /(170)
Sugar_coated_bear_gummies = 1 /(170)
String = 1 /(170)
Lego= 5 /(170)
1 / (M_n_Ms^2 + Kisses^2 + Skittles^2 + Spheres^2 + Jolly_Ranchers^2 + Wine_gummies^2 + Octupi_gummies^2 + Swirl_gummies^2 + Cherry_gummies^2 + Watermelon_gummies^2 + Cola_gummies^2 + Classic_bear_gummies^2 + Sugar_coated_bear_gummies^2 + String^2 + Lego^2)
## [1] 4.607121
M_n_Ms = 214 /(730)
Kisses = 16 /(730)
Skittles = 197 /(730)
Spheres = 19 /(730)
Jolly_Ranchers = 131 /(730)
Wine_gummies = 6 /(730)
Octupi_gummies = 6 /(730)
Swirl_gummies = 3 /(730)
Cherry_gummies = 1 /(730)
Watermelon_gummies = 1 /(730)
Cola_gummies = 3 /(730)
Classic_bear_gummies = 101 /(730)
Sugar_coated_bear_gummies = 3 /(730)
String = 14 /(730)
Lego= 17 /(730)
1 / (M_n_Ms^2 + Kisses^2 + Skittles^2 + Spheres^2 + Jolly_Ranchers^2 + Wine_gummies^2 + Octupi_gummies^2 + Swirl_gummies^2 + Cherry_gummies^2 + Watermelon_gummies^2 + Cola_gummies^2 + Classic_bear_gummies^2 + Sugar_coated_bear_gummies^2 + String^2 + Lego^2)
## [1] 4.708845
The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.
Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.
\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)
\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more
So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =
12 + 5^2/(7*2)
## [1] 13.78571
15 + 2^2/(13*2)
## [1] 15.15385
We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package. You will need to install this package if you have not done so previously.
library(vegan)
First, we must remove the unnecesary data columns and transpose the data so that vegan reads it as a species table with species as columns and rows as samples (of which you only have 1).
example_data1_diversity =
example_data1 %>%
select(name, occurences) %>%
spread(name, occurences)
example_data1_diversity
example_data2_diversity =
example_data2 %>%
select(name, occurences) %>%
spread(name, occurences)
example_data2_diversity
Then we can calculate the Simpson Reciprocal Index using the diversity function.
diversity(example_data1_diversity, index="invsimpson")
## [1] 4.610721
diversity(example_data2_diversity, index="invsimpson")
## [1] 4.734682
And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.
specpool(example_data1_diversity)
specpool(example_data2_diversity)
diversity(example_data1_diversity, index="shannon")
## [1] 1.730667
diversity(example_data2_diversity, index="shannon")
## [1] 1.799751
In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.
For your sample:
What are the Simpson Reciprocal Indices for your sample and community using the R function?
Community: 15
These values match your previous calculations.
If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.
Can you think of alternative ways to cluster or bin your data that might change the observed number of species?
How might different sequencing technologies influence observed diversity in a sample?
Callahan, BJ, McMurdie, PJ, Holmes, SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal. 11:2639. Link
Gaudet, AD, Ramer, LM, Nakonechny, J, Cragg, JJ, Ramer, MS. 2010. Small-group learning in an upper-level university biology class enhances academic performance and student attitudes toward group work. PloS One. 5:e15821. doi: 10.1371/journal.pone.0015821. Link
Hallam, SJ, Torres-Beltrán M, Hawley, AK. 2017. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Scientific Data. 4:. Link
Hawley, AK, Torres-Beltrán M, Zaikova, E, Walsh, DA, Mueller, A, Scofield, M, Kheirandish, S, Payne, C, Pakhomova, L, Bhatia, M. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific Data. 4:170160. Link
Kunin, V, Engelbrektson, A, Ochman, H, Hugenholtz, P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ. Microbiol. 12:118-123. doi: 10.1111/j.1462-2920.2009.02051.x. 19725865
Sogin, ML, Morrison, HG, Huber, JA, Welch, DM, Huse, SM, Neal, PR, Arrieta, JM, Herndl, GJ. 2006. Microbial Diversity in the Deep Sea and the Underexplored “Rare Biosphere”. Proc. Natl. Acad. Sci. U. S. A. 103:12115-12120. doi: 10.1073/pnas.0605127103. 16880384
Torres-Beltrán, M, Hawley, AK, Capelle, D, Zaikova, E, Walsh, DA, Mueller, A, Scofield, M, Payne, C, Pakhomova, L, Kheirandish, S. 2017. A compendium of geochemical information from the Saanich Inlet water column. Scientific Data. 4:170159. Link
Welch, RA, Burland, V, Plunkett, G, Redford, P, Roesch, P, Rasko, D, Buckles, EL, Liou, S-, Boutin, A, Hackett, J, Stroud, D, Mayhew, GF, Rose, DJ, Zhou, S, Schwartz, DC, Perna, NT, H. L. T. Mobley, Donnenberg, MS, Blattner, FR. 2002. Extensive Mosaic Structure Revealed by the Complete Genome Sequence of Uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 99:17020-17024. doi: 10.1073/pnas.252529799. Link